Phase 3: Base Analysis - ChartsMaze EDL Pipeline

Overview

Phase 3 is the critical synthesis stage where all data from Phase 1 and Phase 2 is merged into a single unified JSON structure. This phase runs a single script that produces the base all_stocks_fundamental_analysis.json file.

If bulk_market_analyzer.py fails, the pipeline stops. Phase 4 scripts cannot proceed without the base JSON file.

Execution Order

Phase 3 runs one critical script:

Build Master JSON

Script: bulk_market_analyzer.pyMerges fundamental data, technical indicators, and listing dates into a unified structure.

Script: bulk_market_analyzer.py

Purpose

Merges data from multiple sources to create the base JSON with 86 fields per stock across 2,775 stocks.

Input Files

fundamental_data.json              (Phase 1)
master_isin_map.json               (Phase 1)
dhan_data_response.json            (Phase 1)
advanced_indicator_data.json       (Phase 2)
nse_equity_list.csv                (Phase 1)

Data Merging Process

The script iterates through all stocks and merges data sections:

for item in data:
    symbol = item.get("Symbol", "UNKNOWN")
    tech = dhan_tech_map.get(symbol, {})
    adv_tech = adv_tech_map.get(symbol, {})
    
    # Extract quarterly data
    cq = item.get("incomeStat_cq", {})
    cy = item.get("incomeStat_cy", {})
    ttm_cy = item.get("TTM_cy", {})
    cv = item.get("CV", {})
    roce_roe = item.get("roce_roe", {})
    
    # Build unified record
    analyzed_data.append({
        "Symbol": symbol,
        "Name": tech.get("DispSym"),
        "Market Cap(Cr.)": tech.get("Mcap"),
        "P/E": cv.get("PE"),
        "ROE(%)": roce_roe.get("ROE"),
        # ... 81 more fields
    })

Output Files

File	Description	Size	Records
`all_stocks_fundamental_analysis.json`	Base master JSON	~45 MB	2,775

Output Structure

{
  "Symbol": "RELIANCE",
  "Name": "Reliance Industries Ltd.",
  "Listing Date": "29-NOV-1977",
  "Basic Industry": "Petroleum Products",
  "Sector": "Oil, Gas & Consumable Fuels",
  "Index": "NIFTY 50, NIFTY ENERGY",
  
  "Market Cap(Cr.)": "1825000",
  "Stock Price(₹)": "2468.75",
  "P/E": "28.5",
  "ROE(%)": "8.2",
  "ROCE(%)": "9.1",
  "D/E": "0.52",
  
  "Latest Quarter": "Dec-25",
  "Net Profit Latest Quarter": "17594",
  "Net Profit Previous Quarter": "16446",
  "EPS Latest Quarter": "26.10",
  "EPS Previous Quarter": "24.40",
  "Sales Latest Quarter": "245000",
  "OPM Latest Quarter(%)": "12.5",
  
  "QoQ Net Profit Change(%)": "7.0",
  "YoY Net Profit Change(%)": "43.3",
  
  "RSI (14)": "62.5",
  "SMA Status": "SMA 20: Above (4.9%) | SMA 50: Above (24.1%)",
  "EMA Status": "EMA 20: Above (6.3%) | EMA 200: Above (72.6%)",
  "Technical Sentiment": "RSI: Neutral | MACD: Bearish",
  "Pivot Point": "245.50",
  
  "1 Day Returns(%)": "1.2",
  "1 Week Returns(%)": "3.5",
  "1 Month Returns(%)": "8.2",
  "3 Month Returns(%)": "15.6",
  "1 Year Returns(%)": "45.3",
  "% from 52W High": "-5.2",
  "% from 52W Low": "72.8",
  
  "FII % change QoQ": "0.5",
  "DII % change QoQ": "-0.3",
  "Free Float(%)": "50.4"
}

Data Processing Steps

Step 1: Load All Data Sources

print("Loading fundamental data...")
with open(input_file, "r") as f:
    data = json.load(f)

print("Loaded technical data for {len(dhan_tech_map)} symbols.")
print("Loaded advanced indicators for {len(adv_tech_map)} symbols.")
print("Loaded listing dates for {len(listing_date_map)} symbols.")

Step 2: Extract Quarterly Fundamentals

# Parse pipe-separated quarterly values
net_profit_latest = get_value_from_pipe_string(cq.get("Net_Profit"), 0)
net_profit_prev = get_value_from_pipe_string(cq.get("Net_Profit"), 1)
net_profit_2q = get_value_from_pipe_string(cq.get("Net_Profit"), 2)
net_profit_3q = get_value_from_pipe_string(cq.get("Net_Profit"), 3)
net_profit_last_yr = get_value_from_pipe_string(cq.get("Net_Profit"), 4)

# Calculate QoQ and YoY changes
qoq_change = calculate_change(net_profit_latest, net_profit_prev)
yoy_change = calculate_change(net_profit_latest, net_profit_last_yr)

Step 3: Merge Technical Indicators

# From dhan_data_response.json
rsi = tech.get("DayRSI14CurrentCandle", 0)
sma_50_distance = tech.get("DaySMA50CurrentCandle", 0)
sma_200_distance = tech.get("DaySMA200CurrentCandle", 0)

# From advanced_indicator_data.json
pivot_point = adv_tech.get("PivotPoint")
ema_status = adv_tech.get("EMA_Status")
sma_status = adv_tech.get("SMA_Status")
technical_sentiment = adv_tech.get("Technical_Sentiment")

Step 4: Build Unified Record

analyzed_data.append({
    # Identity
    "Symbol": symbol,
    "Name": tech.get("DispSym"),
    "Listing Date": listing_date_map.get(symbol, "N/A"),
    
    # Fundamentals
    "Net Profit Latest Quarter": net_profit_latest,
    "QoQ Net Profit Change(%)": qoq_change,
    "YoY Net Profit Change(%)": yoy_change,
    
    # Technical
    "RSI (14)": rsi,
    "SMA Status": sma_status,
    "Pivot Point": pivot_point,
    
    # Placeholders for Phase 4
    "RVOL": 0,
    "5 Days MA ADR(%)": 0,
    "% from ATH": 0,
    "Event Markers": [],
    "Recent Announcements": [],
    "News Feed": []
})

Step 5: Save Output

with open(output_file, "w") as f:
    json.dump(analyzed_data, f, indent=4)

print(f"✅ Analysis complete. Saved {len(analyzed_data)} stocks to {output_file}")

Field Calculation Examples

Quarterly Changes (QoQ, YoY)

def calculate_change(current, previous):
    if previous == 0:
        return 0.0
    return ((current - previous) / abs(previous)) * 100

# Example: Net Profit QoQ
# Latest: 17594, Previous: 16446
qoq = ((17594 - 16446) / 16446) * 100 = 7.0%

Pipe String Parsing

def get_value_from_pipe_string(pipe_string, index):
    # Input: "17594|16446|15138|17955|12273"
    # Index 0 → 17594 (Latest Quarter)
    # Index 1 → 16446 (Previous Quarter)
    # Index 4 → 12273 (Last Year Same Quarter)
    
    parts = pipe_string.split('|')
    if index < len(parts):
        return float(parts[index])
    return 0.0

SMA/EMA Status Formatting

# From advanced_indicator_data.json:
{
  "SMA_20_Distance": 4.9,
  "SMA_50_Distance": 24.1,
  "EMA_20_Distance": 6.3,
  "EMA_200_Distance": 72.6
}

# Formatted output:
"SMA Status": "SMA 20: Above (4.9%) | SMA 50: Above (24.1%)"
"EMA Status": "EMA 20: Above (6.3%) | EMA 200: Above (72.6%)"

Dependencies

Required from Phase 1

fundamental_data.json (CRITICAL)
master_isin_map.json (for iteration)
dhan_data_response.json (for technical data)
nse_equity_list.csv (for listing dates)

Required from Phase 2

advanced_indicator_data.json (for SMA/EMA/Pivot)

Optional (Soft Dependencies)

If any file is missing, the script continues but fields will be empty/0

Typical Execution Time

~30-60 seconds — Pure in-memory data merging (no API calls)

Performance Breakdown

Load all JSONs: ~5s
Iterate 2,775 stocks: ~20s
Write output JSON: ~10s

Error Handling

Critical Failure Detection

results["bulk_market_analyzer.py"] = run_script("bulk_market_analyzer.py", "Phase 3")

if not results["bulk_market_analyzer.py"]:
    print("🛑 CRITICAL: bulk_market_analyzer.py failed.")
    print("   Cannot produce all_stocks_fundamental_analysis.json.")
    return  # Pipeline stops

Non-Critical Missing Files

try:
    with open(ADVANCED_FILE, "r") as f:
        adv_data = json.load(f)
except FileNotFoundError:
    print(f"Warning: {ADVANCED_FILE} not found. Running without advanced indicators.")
    adv_tech_map = {}  # Empty map, fields will be null

Output Validation

File Size Check

ls -lh all_stocks_fundamental_analysis.json
# Expected: ~45 MB (2,775 stocks × 86 fields)

Record Count Check

import json

with open("all_stocks_fundamental_analysis.json", "r") as f:
    data = json.load(f)

print(f"Total stocks: {len(data)}")  # Expected: 2775
print(f"Fields per stock: {len(data[0].keys())}")  # Expected: 86

Field Completeness Check

required_fields = [
    "Symbol", "Name", "Market Cap(Cr.)", "P/E", "ROE(%)",
    "Net Profit Latest Quarter", "EPS Latest Quarter",
    "RSI (14)", "SMA Status", "1 Year Returns(%)"
]

for stock in data:
    missing = [f for f in required_fields if f not in stock]
    if missing:
        print(f"{stock['Symbol']}: Missing fields {missing}")

What Phase 3 Does NOT Include

The base JSON does NOT include the following (populated in Phase 4):

These fields are placeholders (0 or empty arrays) in Phase 3:

Advanced Metrics: RVOL, ADR, ATH, Turnover, 200D EMA Volume
Earnings Performance: Returns since Earnings, Max Returns since Earnings
F&O Data: F&O Flag, Lot Size, Next Expiry
Event Markers: Surveillance, Insider Trading, Block Deals, Corporate Actions
Recent Announcements: Top 5 regulatory filings with PDF links
News Feed: Top 5 media news items with sentiment

Phase 3 Output Summary

Files Produced

📦 Phase 3 Output:
└─ all_stocks_fundamental_analysis.json   (~45 MB, 2,775 records, 86 fields each)

Field Coverage

Complete: 60 fields (fundamentals, technicals, ratios, returns)
Placeholders: 26 fields (to be populated in Phase 4)

Next Phase

Once Phase 3 completes, the pipeline proceeds to:

Phase 4: Enrichment Injection

Modifies the master JSON in-place to inject advanced metrics, F&O data, earnings performance, and event markers. Order matters!

​Overview

​Execution Order

​Script: bulk_market_analyzer.py

​Purpose

​Input Files

​Data Merging Process

​Output Files

​Output Structure

​Data Processing Steps

​Step 1: Load All Data Sources

​Step 2: Extract Quarterly Fundamentals

​Step 3: Merge Technical Indicators

​Step 4: Build Unified Record

​Step 5: Save Output

​Field Calculation Examples

​Quarterly Changes (QoQ, YoY)

​Pipe String Parsing

​SMA/EMA Status Formatting

​Dependencies

​Required from Phase 1

​Required from Phase 2

​Optional (Soft Dependencies)

​Typical Execution Time

​Performance Breakdown

​Error Handling

​Critical Failure Detection

​Non-Critical Missing Files

​Output Validation

​File Size Check

​Record Count Check

​Field Completeness Check

​What Phase 3 Does NOT Include

​Phase 3 Output Summary

​Files Produced

​Field Coverage

​Next Phase

Phase 4: Enrichment Injection

Overview

Execution Order

Script: bulk_market_analyzer.py

Purpose

Input Files

Data Merging Process

Output Files

Output Structure

Data Processing Steps

Step 1: Load All Data Sources

Step 2: Extract Quarterly Fundamentals

Step 3: Merge Technical Indicators

Step 4: Build Unified Record

Step 5: Save Output

Field Calculation Examples

Quarterly Changes (QoQ, YoY)

Pipe String Parsing

SMA/EMA Status Formatting

Dependencies

Required from Phase 1

Required from Phase 2

Optional (Soft Dependencies)

Typical Execution Time

Performance Breakdown

Error Handling

Critical Failure Detection

Non-Critical Missing Files

Output Validation

File Size Check

Record Count Check

Field Completeness Check

What Phase 3 Does NOT Include

Phase 3 Output Summary

Files Produced

Field Coverage

Next Phase